Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.
translated by 谷歌翻译
本文介绍了一种通过张量 - 训练(TT)分解来更紧凑地表示图形神经网络(GNN)表的新方法。我们考虑(a)缺乏节点特征的图形数据,从而在训练过程中学习嵌入的情况; (b)我们希望利用GPU平台,即使对于大型内存GPU,也需要较小的桌子来减少主机到GPU的通信。 TT的使用实现了嵌入的紧凑参数化,使其足够小,甚至可以完全适合现代GPU,即使是大量图形。当与明智的初始化和分层图分区结合使用时,这种方法可以将嵌入矢量的大小降低1,659次,至81,362次,在大型公开可用的基准数据集中,可以实现可比性或更高的准确性或更高的准确性和在多GPU系统上的显着速度。在某些情况下,我们的模型在输入上没有明确的节点功能甚至可以匹配使用节点功能的模型的准确性。
translated by 谷歌翻译
私人随机梯度下降(DP-SGD)是私人深度学习最新进展的主力算法。它为数据集中的所有数据点提供了单个隐私保证。我们提出了一种有效的算法,以在释放由DP-SGD培训的模型时计算单个示例的隐私保证。我们使用算法来研究许多数据集中的个人隐私参数。我们发现,大多数示例比最严重的案例拥有更强的隐私保证。我们进一步发现,训练损失和示例的隐私参数是非常相关的。这意味着在模型效用方面服务不足的群体在隐私保证方面同时服务不足。例如,在CIFAR-10上,测试准确性最低的课程的平均$ \ epsilon $比班级的平均$ \ epsilon $高26.3%。我们还运行会员推理攻击,以表明这反映了不同的经验隐私风险。
translated by 谷歌翻译
Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs. GeoMLAMA contains 3,125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard multilingual PLMs on GeoMLAMA. Interestingly, we find that 1) larger multilingual PLMs variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) multilingual PLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country. Code and data are released at https://github.com/WadeYin9712/GeoMLAMA.
translated by 谷歌翻译
对图形的对抗攻击对图形机器学习(GML)模型的鲁棒性构成了重大威胁。当然,攻击者和捍卫者之间存在一场易于升级的军备竞赛。但是,在相同和现实的条件下,双方背后的策略往往不相当。为了弥合这一差距,我们展示了Graph稳健性基准(GRB),其目的是为GML模型的对抗鲁棒性提供可扩展,统一,模块化和可重复的评估。 GRB将攻击和防御过程标准化1)开发可扩展和多样化的数据集,2)模块化攻击和防御实现,以及统一精细方案中的评估协议。通过利用GRB管道,最终用户可以专注于具有自动数据处理和实验评估的强大GML模型的开发。为了支持对图形对抗性学习的开放和可重复研究,GRB还遍布不同方案的公共排行榜。作为起点,我们对基准基线技术进行了广泛的实验。 GRB是开放的,欢迎社区的贡献。数据集,代码,排行榜可在https://cogdl.ai/grb/home获得。
translated by 谷歌翻译
不分青红皂白血管中毒攻击,它为训练数据添加了不可察觉的扰动,以最大化训练有素的模型的测试错误,已成为一个时尚的主题,因为它们被认为能够防止未经授权使用数据。在这项工作中,我们调查为什么这些扰动原则上的工作。我们发现,当分配了相应样本的目标标签时,高级中毒攻击的扰动几乎是\ textBF {线性分离},因此可以为学习目标作为\ emph {快捷方式}工作。这个重要的人口财产尚未在之前揭幕。此外,我们进一步验证了线性可分性确实是中毒攻击的摩擦。我们将线性可分离数据综合为扰动,表明这种合成扰动与故意制作的攻击一样强大。我们的发现表明,\ emph {捷径学习}问题比以前认为深入学习依赖于快捷方式,即使它们与正常特征相混合,也会依赖于捷径。这一发现还建议预审训练的特征提取器会有效地禁用这些中毒攻击。
translated by 谷歌翻译
我们为大规模训练的大规模训练语言模型提供了更简单,更稀疏,更快的算法,这些算法在许多标准的NLP任务上实现了最新的隐私与实用性权衡。我们为此问题提出了一个元框架,这是受高度参数效率方法进行微调成功的启发。我们的实验表明,这些方法的差异化适应能力在三个重要方面优于以前的私人算法:实用程序,隐私以及私人培训的计算和记忆成本。在许多经常研究的数据集中,私人模型的实用性接近了非私人模型的方法。例如,在MNLI数据集上,我们使用Roberta-large的准确度为87.8 \%$,使用Roberta-Base $ 83.5 \%$,其隐私预算为$ \ Epsilon = 6.7 $。相比之下,缺乏隐私限制,罗伯塔·莱格(Roberta-Large)的准确度为$ 90.2 \%$。我们的发现对于自然语言生成任务类似。与DART,GPT-2-SMALL,GPT-2中,GPT-2-MEDIUM,GPT-2-LARGE和GPT-2-XL的私人微调达到38.5、42.0、43.1和43.8($ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 43.8) epsilon = 6.8,\ delta = $ 1E-5),而非私人基线为$ 48.1 $。我们所有的实验都表明,较大的模型更适合私人微调:虽然众所周知,它们旨在非优先实现卓越的准确性,但我们发现当引入隐私时,它们也更好地保持其准确性。
translated by 谷歌翻译
Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.
translated by 谷歌翻译
我们提出了一种重新制定方案,解决了在大型神经网络上应用差异私有SGD的挑战,这是1)存储个体梯度的巨大内存成本,2)增加令人臭名昭着的尺寸依赖的噪声。具体地,我们用两个\ emph {梯度 - 载波}的每个权重矩阵重新定位小维度的矩阵和一个\ emph {残差}矩阵。我们认为,这种重新游离的游离过程保持不变,同时使我们能够计算投影梯度而不计算梯度本身。为了学习差异隐私,我们设计\ emph {Reparamiratized梯度扰动(RGP)},其覆盖梯度载波矩阵上的梯度,并从嘈杂的渐变重新计算原始权重的更新。重要的是,我们使用历史更新来查找渐变 - 载波矩阵,其最优性在线性回归下严格合理,并经过深入学习任务。 RGP显着降低了内存成本并改善了该实用程序。例如,我们是第一个能够在BERT模型上应用差异隐私,并在四个下游任务中实现83.9 \%$ 83.9 = 8 $的平均准确性,而与非 - 私人基线,但享有更低的隐私泄漏风险。
translated by 谷歌翻译
常规域中的文本到图像生成长期以来一直是一个开放问题,这需要强大的生成模型和跨模型理解。我们提出CogView,一个带VQ-VAE牌器的40亿参数变压器来推进此问题。我们还展示了各种下游任务的FineTuning策略,例如,风格学习,超分辨率,文本图像排名和时装设计,以及稳定预制雷岭的方法,例如,消除南损失。Cogview在模糊的MS Coco DataSet上实现最先进的FID,优于以前的基于GAN的模型和最近类似的工作Dall-e。
translated by 谷歌翻译